Memory Ordering in Modern Microprocessors
نویسنده
چکیده
Memory accesses are among the slowest of a CPU’s operations, due to the fact that Moore’s law has increased CPU instruction performance at a much greater rate than it has increased memory performance. This difference in performance increase means that memory operations have been getting increasingly expensive compared to simple register-to-register instructions. Modern CPUs sport increasingly large caches in order to reduce the overhead of these expensive memory accesses. These caches can be thought of as simple hardware hash table with fixed size buckets and no chaining, as shown in Figure 1. This cache has sixteen “lines” and two “ways” for a total of 32 “entries”, each entry containing a single 256-byte “cache line”, which is a 256byte-aligned block of memory. This cache line size is a little on the large size, but makes the hexadecimal arithmetic much simpler. In hardware parlance, this is a two-way set-associative cache, and is analogous to a software hash table with sixteen buckets, where each bucket’s hash chain is limited to at most two elements. Since this cache is implemented in hardware, the hash function is extremely simple: extract four bits from the memory address. In Figure 1, each box corresponds to a cache entry, which can contain a 256-byte cache line. However, a cache entry can be empty, as indicated by the empty boxes in the figure. The rest of the boxes are flagged with the memory address of the cache line that they contain. Since the cache lines must be 256-byte aligned, the low eight bits of each address are zero, and the choice of hardware hash function means that the next-higher four bits match the hash line number. The situation depicted in the figure might arise if the program’s code were located at address 0x43210E00 through 0x43210EFF, and this program accessed data sequentially from 0x12345000 through 0x12345EFF. Suppose that the program were now to access location 0xF 0xE 0xD 0xC 0xB 0xA 0x9 0x8 0x7 0x6 0x5 0x4 0x3 0x2 0x1 0x0 Way 0
منابع مشابه
Compiler-managed memory system for software-exposed architectures
Microprocessors must exploit both instruction-level parallelism (ILP) and memory parallelism for high performance. Sophisticated techniques for ILP have boosted the ability of modern-day microprocessors to exploit ILP when available. Unfortunately, improvements in memory parallelism in microprocessors have lagged behind. This thesis explains why memory parallelism is hard to exploit in micropro...
متن کاملMicroarchitectural Innovations: Boosting Microprocessor Performance Beyond Semiconductor Technology Scaling
Semiconductor technology scaling provides faster and more plentiful transistors to build microprocessors, and applications continue to drive the demand for more powerful microprocessors. Weaving the “raw” semiconductor material into a microprocessor that offers the performance needed by modern and future applications is the role of computer architecture. This paper overviews some of the microar...
متن کاملThe effects of memory-access ordering on multiple-issue uniprocessor performance
We study the effect of memory access ordering policies on processor performance. Relaxed ordering policies increase available instruction-level parallelism, but such policies must be evaluated subject to their effect on memory consistency — since virtually all microprocessors are designed to be compatible with shared memory multiprocessor systems, even uniprocessor desktop computers are constra...
متن کاملEffective Instruction Scheduling With Limited Registers
Effective global instruction scheduling techniques have become an important component in modern compilers for exposing more instruction-level parallelism (ILP) and exploiting the everincreasing number of parallel function units. Effective register allocation has long been an essential component of a good compiler for reducing memory references. While instruction scheduling and register allocati...
متن کاملLow-power memory hierarchies: an argument for second-level caches
With the availability of high-performance, low-power microprocessors, portable computing is becoming commonplace. The prevalence of portable computers makes them the most obvious examples of systems in which power requirements are a signi cant design issue. This paper addresses the power tradeo s of an important component of modern memory hierarchies: secondlevel caches. Thought by some to incr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005